Variation of Word Frequencies across Genre Classification Tasks
نویسندگان
چکیده
This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in realising the general automatic extraction of semantic metadata essential to the efficient management and use of digital objects. In the present report, we present an analysis of word frequencies in different genre classes in an effort to understand the distinction between independent classification tasks. In particular, we examine automated experiments on thirty-one genre classes to determine the relationship between the word frequency metrics and the degree of its significance in carrying out classification in varying environments.
منابع مشابه
Linguistic Profiling of Texts Across Textual Genres and Readability Levels. An Exploratory Study on Italian Fictional Prose
In this paper we present a case study focusing on the literature genre, in particular on Italian fictional prose, aimed at identifying the features characterizing this text type. Identified features were tested in two classification tasks, i.e. by genre and by readability, with promising results. Interestingly, the same multi–level set of linguistic features turned out to reliably capture varia...
متن کاملEFL Writing Styles across Personality Traits and Gender: A Case for Iranian Academic Context
The ways individuals use words can reflect basic psychological processes, including clues to their thoughts, feelings, perceptions, and personality. This paper seeks to determine whether there is a relationship between Iranian EFL learners' writing styles and their personality and gender. It focuses on gender and two key dimensions of personality (Neuroticism and Extroversion), which were asse...
متن کاملPredictive Power of Involvement Load Hypothesis and Technique Feature Analysis across L2 Vocabulary Learning Tasks
Involvement Load Hypothesis (ILH) and Technique Feature Analysis (TFA) are two frameworks which operationalize depth of processing of a vocabulary learning task. However, there is dearth of research comparing the predictive power of the ILH and the TFA across second language (L2) vocabulary learning tasks. The present study, therefore, aimed to examine this issue across four vocabulary learning...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملText Genre Detection Using Common Word Frequencies
In this paper we present a method for detecting the text genre quickly and easily following an approach originally proposed in authorship attribution studies which uses as style markers the frequencies of occurrence of the most frequent words in a training corpus (Burrows, 1992). In contrast to this approach we use the frequencies of occurrence of the most frequent words of the entire written l...
متن کامل